Goto

Collaborating Authors

 markov model


Relational neurosymbolic Markov models

AIHub

Our most powerful artificial agents cannot be told exactly what to do, especially in complex planning environments. They almost exclusively rely on neural networks to perform their tasks, but neural networks cannot easily be told to obey certain rules or adhere to existing background knowledge. While such uncontrolled behaviour might be nothing more than a simple annoyance next time you ask an LLM to generate a schedule for reaching a deadline in two days and it starts to hallucinate that days have 48 hours instead of 24, it can be much more impactful when that same LLM is controlling an agent responsible for navigating a warehouse filled with TNT and it decides to go just a little too close to the storage compartments. Luckily, controlling neural networks has gained a lot of attention over the last years through the development of . Neurosymbolic AI, or NeSy for short, aims to combine the learning abilities of neural networks with the guarantees that symbolic methods based on automated mathematical reasoning offer.


Fast Gibbs Sampling on Bayesian Hidden Markov Model with Missing Observations

Li, Dongrong, Yu, Tianwei, Fan, Xiaodan

arXiv.org Machine Learning

The Hidden Markov Model (HMM) is a widely-used statistical model for handling sequential data. However, the presence of missing observations in real-world datasets often complicates the application of the model. The EM algorithm and Gibbs samplers can be used to estimate the model, yet suffering from various problems including non-convexity, high computational complexity and slow mixing. In this paper, we propose a collapsed Gibbs sampler that efficiently samples from HMMs' posterior by integrating out both the missing observations and the corresponding latent states. The proposed sampler is fast due to its three advantages. First, it achieves an estimation accuracy that is comparable to existing methods. Second, it can produce a larger Effective Sample Size (ESS) per iteration, which can be justified theoretically and numerically. Third, when the number of missing entries is large, the sampler has a significant smaller computational complexity per iteration compared to other methods, thus is faster computationally. In summary, the proposed sampling algorithm is fast both computationally and theoretically and is particularly advantageous when there are a lot of missing entries. Finally, empirical evaluations based on numerical simulations and real data analysis demonstrate that the proposed algorithm consistently outperforms existing algorithms in terms of time complexity and sampling efficiency (measured in ESS).


On the Stochastic Stability of Deep Markov Models

Neural Information Processing Systems

Deep Markov models (DMM) are generative models which are scalable and expressive generalization of Markov models for representation, learning, and inference problems. However, the fundamental stochastic stability guarantees of such models have not been thoroughly investigated. In this paper, we present a novel stability analysis method and provide sufficient conditions of DMM's stochastic stability. The proposed stability analysis is based on the contraction of probabilistic maps modeled by deep neural networks. We make connections between the spectral properties of neural network's weights and different types of used activation function on the stability and overall dynamic behavior of DMMs with Gaussian distributions. Based on the theory, we propose a few practical methods for designing constrained DMMs with guaranteed stability. We empirically substantiate our theoretical results via intuitive numerical experiments using the proposed stability constraints.


A comparison between initialization strategies for the infinite hidden Markov model

Cortese, Federico P., Rossini, Luca

arXiv.org Machine Learning

Infinite hidden Markov models provide a flexible framework for modelling time series with structural changes and complex dynamics, without requiring the number of latent states to be specified in advance. This flexibility is achieved through the hierarchical Dirichlet process prior, while efficient Bayesian inference is enabled by the beam sampler, which combines dynamic programming with slice sampling to truncate the infinite state space adaptively. Despite extensive methodological developments, the role of initialization in this framework has received limited attention. This study addresses this gap by systematically evaluating initialization strategies commonly used for finite hidden Markov models and assessing their suitability in the infinite setting. Results from both simulated and real datasets show that distance-based clustering initializations consistently outperform model-based and uniform alternatives, the latter being the most widely adopted in the existing literature.




On the Stochastic Stability of Deep Markov Models

Neural Information Processing Systems

This section proposes additional regularization methods for learning stable deep Markov models. The most direct approach is to include the stability conditions as extra penalties in the DMM loss function.



In Appendix A we provide heuristic justification for the scaling of the optimal error rate

Neural Information Processing Systems

In Appendix D we provide the proofs for Theorem 7. In Appendix E we include some useful results for the sake of completeness. Informally, we expect that there is one sign flip (i.e., The top left, top right and bottom left figures show the scaling of the minimax rates of GLM (cf. To begin with the analysis of the estimator in Figure 2, the following lemma is a simple, yet key tool for the proof. It establishes the variance of the random gain S . The proof relies on a sort of self-bounding property (cf.


Thanks all the reviewers for the detailed and thoughtful comments

Neural Information Processing Systems

Thanks all the reviewers for the detailed and thoughtful comments. HMM-based works [1, 2, 3], all of which proposed methods to estimate alignments from unsegmented data. We've not thoroughly explored to improve the duration predictor and simply follow the same We design the grouped 1x1 convolutions to be able to mix channels. For example, to generate a speech of 5.8 Therefore, adopting parallel TTS models significantly improves the sampling speed of end-to-end systems. In Section 5.3, we showed that varying temperature can change We will add a reference about Viterbi training.